Skip to content

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4

Open
yJader wants to merge 5 commits into
Tele-AI:mainfrom
yJader:refactor/cacheseek-adapt
Open

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
yJader wants to merge 5 commits into
Tele-AI:mainfrom
yJader:refactor/cacheseek-adapt

Conversation

@yJader

@yJader yJader commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Co-authored-by: @yx0716

Description

This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.

Motivation

Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local cache_mem code. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Performance improvement
  • Code refactoring
  • Documentation update
  • Other (please describe):

Changes Made

  • Removed the in-tree telefuser/cache_mem implementation and related unit tests.
  • Added lazy CacheSeek service initialization in TeleFuser service container and task service code.
  • Added/updated latent cache CLI and server config plumbing, including direct failure when latent cache is enabled but CacheSeek is unavailable.
  • Updated Wan2.2 T2V service examples for CacheSeek-backed latent cache and added a nocache service example.
  • Added LingBot World Fast world_kv_binding runtime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.
  • Updated English and Chinese latent cache docs with CacheSeek usage and the CacheSeek GitHub link: https://github.com/Tele-AI/CacheSeek.

Testing

  • Unit tests pass (pytest tests/)
  • Manual testing performed
  • Benchmarks added/updated (if applicable)

Test commands:

# TeleFuser targeted latent-cache/service tests.
python -m pytest \
  tests/unit/service/test_latent_cache_cli.py \
  tests/unit/service/test_latent_cache_task_service.py \
  tests/unit/pipelines/wan_video/test_service_examples.py \
  tests/unit/pipelines/wan_video/test_latent_data_utils.py -q
# Result: 15 passed

# Real Wan2.2 service e2e without latent cache.
# Model: Wan2.2-T2V-A14B
# Config: num_inference_steps=2, num_frames=5, resolution=480p, parallelism=1.
# Result: completed; non-empty mp4 generated.

# Real Wan2.2 service e2e with CacheSeek enabled.
# Model: Wan2.2-T2V-A14B
# Cache mode: read_write
# Result: task_status=completed; non-empty mp4 generated.
# Cache evidence: audit log contains lookup_hit skip_step=1 and save_stored.

# Real LingBot World Fast exact-prefix e2e through TeleFuser world_kv hooks.
# Note: uses CacheSeek as an external dependency; this PR only includes TeleFuser-side hooks.
export CUDA_VISIBLE_DEVICES=0,1
export LINGBOT_WORLD_CHECKPOINT_DIR=<lingbot-world-fast-checkpoint-root>
export WORLDKV_REPO_ROOTS=<telefuser-repo>:<cacheseek-repo>
export PYTHONPATH=<telefuser-repo>:<cacheseek-repo>:${PYTHONPATH:-}
cd <cacheseek-repo>
python \
  examples/exact_prefix_reuse/e2e_telefuser_lingbot.py \
  --frame-num 13 \
  --prefix-chunks 1 \
  --out-dir <output-dir> \
  --image-path <lingbot-example-image> \
  --action-path <lingbot-example-action-dir> \
  --aux-device cuda:1 \
  --no-save-videos
# Result: all_pass=true; fast_forward_k A=0, B=1, C=1, D=0.

Additional validation notes:

  • Wan2.2 CacheSeek audit log contained lookup_hit skip_step=1 and save_stored.
  • LingBot e2e manifest reported all_pass=true.
  • LingBot e2e log reported world_kv: fast-forward 1 chunks (decode-only).
  • GPUs were checked after e2e runs and had no remaining compute processes.

Checklist

  • Code follows the project's coding standards (ruff)
  • Pre-commit hooks pass (pre-commit run --all-files)
  • All tests pass (pytest tests/)
  • New tests added for new functionality
  • Documentation updated (README, CLAUDE.md, docstrings)
  • Commit messages are clear and descriptive
  • PR title follows the convention: [TYPE] Brief description

Related Issues

N/A

Additional Notes

This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser world_kv_binding hooks are usable end to end.

GPU Architecture Support

  • SM80 (Ampere, Ada Lovelace)
  • SM90 (Hopper H100)
  • SM100+ (Blackwell)

No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.

Performance Impact

No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:

  • A cold run: 7.456s
  • B full hit: 1.519s
  • C prefix hit: 1.494s
  • D cold fork reference: 4.628s

These are smoke e2e timings on H100 and should not be treated as a formal benchmark.

yx0716 and others added 5 commits June 18, 2026 15:50
Replace the in-tree telefuser/cache_mem cache with cacheseek as the
cross-request cache middleware.

- service (container/task_service/api_server): build and drive
  (CacheService, TeleFuserCacheAdapter); per request build_query ->
  lookup -> apply_resume -> on_response -> save
- lingbot_world_fast: world_kv hooks (on_runtime_created /
  on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse;
  enable rolling KV window (local_attn_size=7, sink_size=3)
- remove legacy telefuser/cache_mem + service/cache/cache_factory|
  cache_service and the cache_mem unit tests
- pin torch==2.7.0 + torchvision==0.22.0
- docs: update latent_cache (en/zh)
…arch-v2)

arch-v2 退役了 cacheseek.core,CacheConfig 现从顶层 `cacheseek` 导出。cache 与
nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config,
导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yJader yJader marked this pull request as ready for review July 1, 2026 14:03
@lzx1413 lzx1413 requested a review from Copilot July 2, 2026 02:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors TeleFuser’s latent cache feature by removing the in-tree cache_mem implementation and replacing it with an optional CacheSeek-backed integration. The integration is wired through the service container / task service, exposed via CLI flags, and documented (EN/ZH), while preserving “cache disabled by default” behavior.

Changes:

  • Replace TeleFuser-local latent cache wiring with CacheSeek (cache_service, cache_adapter) lifecycle hooks (lookup/resume/save) and fail-fast startup when enabled but CacheSeek is missing.
  • Add CLI/server config plumbing for --enable-latent-cache and --cache-mode, plus unit tests covering lazy import and failure semantics.
  • Update Wan2.2 service examples and LingBot World Fast runtime hooks for CacheSeek reuse, and refresh latent-cache docs (EN/ZH).

Reviewed changes

Copilot reviewed 47 out of 53 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/viewer/weight_viewer.py Minor formatting cleanup in number formatting.
tools/deploy/show_stat.py Minor formatting/quoting cleanup in output strings.
tools/deploy/docker_monitor.py Minor formatting cleanup in output strings.
tests/unit/service/test_latent_cache_task_service.py New test validating CacheSeek lifecycle calls from MediaGenerationService.
tests/unit/service/test_latent_cache_cli.py New CLI/container tests for lazy import and fail-fast behavior.
tests/unit/pipelines/wan_video/test_service_examples.py New tests ensuring service examples import without CacheSeek present.
tests/unit/cache_mem/test_types_and_config.py Removed legacy cache_mem unit tests.
tests/unit/cache_mem/test_storage.py Removed legacy cache_mem unit tests.
tests/unit/cache_mem/test_metadata.py Removed legacy cache_mem unit tests.
tests/unit/cache_mem/test_concurrency.py Removed legacy cache_mem concurrency tests.
tests/unit/cache_mem/init.py Removed legacy cache_mem test package init.
telefuser/service/main.py Thread latent-cache flags into server config at startup.
telefuser/service/core/task_service.py Switch cache flow to CacheSeek adapter (build_query/lookup/apply_resume/on_response/save).
telefuser/service/core/container.py Lazy-import CacheSeek factory; fail fast on missing/failed init; store adapter in container.
telefuser/service/core/config.py Add cache_mode to ServerConfig for service-level override plumbing.
telefuser/service/cache/cache_service.py Removed legacy TeleFuser cache service implementation.
telefuser/service/cache/cache_factory.py Removed legacy TeleFuser cache factory implementation.
telefuser/service/cache/init.py Mark legacy cache namespace as deprecated (no longer a facade).
telefuser/service/api/api_server.py Forward cache_adapter into API service initialization.
telefuser/pipelines/lingbot_world_fast/session.py Add optional world_kv_binding + runtime state for cached-latent fast-forward.
telefuser/pipelines/lingbot_world_fast/pipeline.py Add world-KV fast-forward hook points and decode-only cached chunk path.
telefuser/entrypoints/cli/main.py Add --enable-latent-cache and --cache-mode options and forward into run_server.
telefuser/cache_mem/vector_store/qdrant.py Removed legacy cache_mem vector store code.
telefuser/cache_mem/vector_store/interfaces.py Removed legacy cache_mem vector store code.
telefuser/cache_mem/vector_store/faiss.py Removed legacy cache_mem vector store code.
telefuser/cache_mem/vector_store/init.py Removed legacy cache_mem vector store exports.
telefuser/cache_mem/strategies.py Removed legacy cache_mem strategy implementation/registry.
telefuser/cache_mem/storage/memory.py Removed legacy cache_mem storage backend.
telefuser/cache_mem/storage/local_file.py Removed legacy cache_mem storage backend.
telefuser/cache_mem/storage/interfaces.py Removed legacy cache_mem storage interfaces.
telefuser/cache_mem/storage/fluxon.py Removed legacy cache_mem storage stub.
telefuser/cache_mem/storage/init.py Removed legacy cache_mem storage exports.
telefuser/cache_mem/state/interfaces.py Removed legacy cache_mem state interfaces.
telefuser/cache_mem/src/models/qwen3_vl_reranker.py Removed legacy cache_mem model code.
telefuser/cache_mem/src/models/qwen3_vl_embedding.py Removed legacy cache_mem model code.
telefuser/cache_mem/metadata.py Removed legacy cache_mem metadata manager.
telefuser/cache_mem/log_monitor.py Removed legacy cache_mem log sink utilities.
telefuser/cache_mem/latent_cache.py Removed legacy cache_mem LatentCache facade.
telefuser/cache_mem/encoding/interfaces.py Removed legacy cache_mem encoder interfaces.
telefuser/cache_mem/encoders.py Removed legacy cache_mem encoder wiring.
telefuser/cache_mem/connection.py Removed legacy cache_mem connection manager.
telefuser/cache_mem/config.py Removed legacy cache_mem config types.
telefuser/cache_mem/cache_types.py Removed legacy cache_mem cache result/types.
telefuser/cache_mem/init.py Removed legacy cache_mem package facade.
pyproject.toml Pin torch/torchvision and update cache extra description to reflect CacheSeek usage.
examples/wan_video/wan22_14b_text_to_video_service.py Update service example docs/config for CacheSeek-based lifecycle.
examples/wan_video/wan22_14b_text_to_video_service_nocache.py New “no-cache” Wan2.2 service example variant.
docs/zh/latent_cache.md Rewrite doc to describe CacheSeek integration and updated service flow.
docs/en/latent_cache.md Rewrite doc to describe CacheSeek integration and updated service flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread telefuser/service/core/container.py
Comment on lines 24 to +25
def _build_cache_task_request(task_data: dict) -> SimpleNamespace:
"""Build a minimal task_request stub for the cache layer.

Splatting ``task_data`` directly would crash because ``TaskRequest`` is
``extra="allow"`` and may contain keys that are not valid Python
identifiers. The cache layer only reads ``task_id`` / ``task`` /
``prompt`` via ``getattr``, so we whitelist those.
"""
"""Build a minimal task_request stub for the cache layer."""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants